Assess how much the LISTING_CTR_CODE impacts time to offer. The hypothesis is that more remote centers (less nearby donors) have longer wait times and consequentially worse waitlist outcomes. Another reason could be the center specifying tight range of when to make a match run (e.g., tight donor/candidate weight ratio).
Assess a center’s acceptance practice. The hypothesis is that centers that are more selective in donors will have longer wait times until transplants (due to the more frequent refusals) and consequentially worse waitlist outcomes.
Settings and Functions
Load all required data
Assess how much the LISTING_CTR_CODE impacts time to offer.
This section builds a cox-ph model to estimate the center impact on time to offer. The model controls for several candidate variables that could impact time to offer: Age, Weight, Status, Blood Type, and Year.
The Listing Center ID is treated as a random effect and the estimated coefficients are used to assess how much longer or shorter their candidates have to wait for an offer. There are two main reasons for center effects: the geographic location of the listing center and how narrowly it determines it wants to be notified for an offer (e.g., the min and max weight range of the donor to be placed on match run).
There is one record per candidate related to the first time the candidate was placed on active status (not temporarily inactive).
Candidate population:
On a HR waitlist (not HL)
Under 18 at time of offer
Added to the waitlist between 2010-01-01 and 2020-12-31
Note: we are using time from waitlist to the match run time when donor was offered to candidate. A better value would be the actual offer time, but the initial response data is not very reliable (e.g., there are some delays in entering the initial response time).
Calculate waitlist time at status
Only consider candidates on HR waitlists (not HL or LU)
Get time candidate added to waitlist
Get time ranges candidate is at each status
Get time candidate removed from waitlist
There can be multiple entries for each waitlist corresponding to multiple status changes during the patient’s time on waitlist
Code
WL_STATUS = WL_hist %>%#: only consider HR waitlist (not HL)filter(ORG =="HR") %>%#: calculate time at statusgroup_by(WL_ID_CODE) %>%arrange(CHG_DT) %>%filter(lag(STATUS) != STATUS |is.na(lag(STATUS)) | CHG_TY =="D") %>%transmute( WL_ID_CODE, STATUS, CHG_TY, # type of status change (A = Added, M = Modified, D = removal)WL_DT = CHG_DT,# date-time of start in new status (or added)# number of days at status (before change or removal)days_at_status =as.numeric(difftime(lead(CHG_DT), CHG_DT, units ="day")),seq =min_rank(CHG_DT), # sequence of status changesfinal = 1L*(lead(CHG_TY) =="D"), # indicator of final status change ) %>%ungroup() %>%#: Remove observation related to waitlist removalsfilter(CHG_TY !="D") %>%arrange(WL_ID_CODE, WL_DT)
Join the PTR (offer) data to the waitlist data
Only consider candidates < 18 (at time of offer).
Adds time of next offer the candidate received for each waitlist listing
If the MATCH_RUN_DT is after the next status change, then the offer doesn’t occur in the given waitlist-status time (i.e., censoring)
Outcome Variables:
wait_days is the number of (fractional) days from waitlist change to next offer
event is the binary indicator that an offer occurred before the next waitlist status change. If event = 1 then an offer was given to candidate before the next status change and event = 0 if the status change occurred first (i.e., censoring).
time is the elapsed time from status change to either offer or another status change. This is the main variable for modeling time to event.
Code
#: Join the PTR (offer) data to the waitlist dataWL_PTR =inner_join(x = WL_STATUS, y = PTR_all %>%filter(AGE_PT <18) %>%# only keep candidates < 18select(WL_ID_CODE, LISTING_CTR_CODE, MATCH_SUBMIT_DT, AGE = AGE_PT, PTR_ID),by =join_by("WL_ID_CODE", closest(WL_DT <= MATCH_SUBMIT_DT)) ) %>%mutate(wait_days =as.numeric(difftime(MATCH_SUBMIT_DT, WL_DT, units ="day")),event =ifelse(days_at_status >= wait_days |is.na(days_at_status), 1L, 0L), time =pmin(wait_days, days_at_status, na.rm=TRUE) )
Make data to model time until offer
We only keep one record per candidate/waitlist so we don’t overrepresent the candidates that had many refusals and offers. While there are few options, we decided to keep the time to event for the candidate’s first active waitlist. Some candidates are initially listed as temporarily inactive.
Predictor Variables:
day is the number of (fractional) days since 2010-01-01. Used to account for temporal trends in time to offer.
ABO is blood type
WEIGHT_KG is candidate’s weight (at time of listing)
AGE is the candidate’s AGE (at time of offer)
STATUS is the candidate’s waitlist status (1A, 1B, 2) at time of offer.
Code
#: get relevant data from thoracicthor = thoracic %>%transmute( WL_ID_CODE, WEIGHT_KG =coalesce(INIT_WGT_KG_CALC, WGT_KG_TCR), ABO =recode(ABO, A1 ="A", A2 ="A", A1B ="AB", A2B ="AB", .missing ="Unknown") )#: make data for modelingdata_cox = WL_PTR %>%#: get first *active* status for each candidate/waitlistfilter(STATUS !="Temporarily Inactive") %>%slice_min(WL_DT, by = WL_ID_CODE, with_ties =FALSE) %>%#: date rangesfilter(as.Date(WL_DT) >=as.Date("2010-01-01"), as.Date(WL_DT) <=as.Date("2020-12-31"), ) %>%#: Add predictor variablesleft_join(thor, by ="WL_ID_CODE") %>%# add thoracic data#: Select relevant data for modeling transmute(# waitlist info WL_ID_CODE, WL_DT,# outcomes time, event,# predictors WEIGHT_KG,LISTING_CTR_CODE =factor(LISTING_CTR_CODE), STATUS =factor(STATUS), AGE,day =as.numeric(difftime(WL_DT, as.Date("2010-01-01"), units ="day"))-1,ABO =factor(ABO) )summary(data_cox)
WL_ID_CODE WL_DT time event
Min. : 5088 Min. :2010-01-04 17:20:02.50 Min. : 0.0004 Min. :0.0000
1st Qu.:1082778 1st Qu.:2013-03-04 16:34:08.85 1st Qu.: 3.1416 1st Qu.:1.0000
Median :1245614 Median :2015-11-10 17:16:40.97 Median : 9.4437 Median :1.0000
Mean :1243462 Mean :2015-10-08 22:02:05.31 Mean : 20.6929 Mean :0.8073
3rd Qu.:1407406 3rd Qu.:2018-06-27 16:50:42.75 3rd Qu.: 24.5026 3rd Qu.:1.0000
Max. :1572456 Max. :2020-12-31 20:38:07.99 Max. :715.2160 Max. :1.0000
WEIGHT_KG LISTING_CTR_CODE STATUS AGE
Min. : 2.000 19437 : 298 Status 1A:3927 Min. : 0.01046
1st Qu.: 6.225 00124 : 278 Status 1B:1071 1st Qu.: 0.57462
Median : 14.800 00248 : 258 Status 2 :1089 Median : 4.32979
Mean : 25.731 19468 : 253 Mean : 6.59012
3rd Qu.: 41.200 13795 : 241 3rd Qu.:12.90929
Max. :177.500 00403 : 236 Max. :17.99826
NA's :1 (Other):4523
day ABO
Min. : 2.722 A :2192
1st Qu.:1157.690 AB: 237
Median :2138.720 B : 807
Mean :2105.918 O :2851
3rd Qu.:3098.702
Max. :4016.860
Fit Cox-PH Model
Predicting time (time until offer) with censoring indicated by event. Using smooth terms (i.e., thin-plate regression splines) for WEIGHT_KG, AGE, and day. Using random effect terms for ABO and LISTING_CTR_STATUS. Including STATUS unpenalized. Parameters estimated using REML.
Note: positive coefficients represent fast time to offer. These are centers that have shorter time to offer than other centers (after controlling for age, weight, blood type, status, and year). The centers’ with negative coefficients have longer waiting until offer.
LC_effect: the coefficient for listing center in the cox-ph regression. Higher (positive) value means shorter wait time. Smaller (negative) value means longer wait time.
total: total number of candidates listed at center
Status {1A, 1B, 2}: number of candidates at each status
Code
(LC_effects = data_cox %>%count(LISTING_CTR_CODE, STATUS) %>%spread(STATUS, n, fill = 0L) %>%mutate(total =`Status 1A`+`Status 1B`+`Status 2`) %>%left_join( LC, by ="LISTING_CTR_CODE" ) %>%relocate(LC_effect, .after=1))
Code
LC_effects %>%write_csv("data/LC_effects.csv")
Median Wait Time
Estimates the median time from listing to first offer per center. Note that this ignores status changes; it only considers time from initial waitlisting to first offer. Many candidates change status before first offer, but this doesn’t account for it.
This is different than how the Listing Center effects are estimated in the above section which limits analysis to first status period.
Join waitlist data to PTR
Code
#: Get waitlist status and waitlist organ at time of match runWL_STATUS = PTR_all %>%select(PTR_ID, WL_ID_CODE, MATCH_SUBMIT_DT) %>%left_join( WL_hist, join_by(WL_ID_CODE, closest(MATCH_SUBMIT_DT > CHG_DT )) ) %>%select(PTR_ID, ORG_CAND = ORG, STATUS_AT_OFFER = STATUS)#: get date candidate added to waitlist# This should match thoracic$INIT_DATE and INIT_STAT# there are only a handful of cases where INIT_DATE < WL_DATE (mostly < 2004)cand_listed_date = WL_hist %>%filter(CHG_TY =="A") %>%# date activated on waitlistdistinct() %>%transmute( WL_ID_CODE, WL_DATE =as.Date(CHG_DT), STATUS_AT_LISTING = STATUS )#: add data to PTR PTR = PTR_all %>%# add PT_CODE, ORG_CAND, and STATUS_CAND to PTRleft_join(WL_STATUS, by="PTR_ID") %>%# add data candidate placed on waitlistleft_join(cand_listed_date, by ="WL_ID_CODE") # left_join(thoracic %>% select(WL_ID_CODE, STATUS_AT_LISTING = INIT_STAT, WL_DATE = INIT_DATE), # by = "WL_ID_CODE")#----------------------------------------------------------------------------### Make offer data ----# NOTES: # - ONLY offers from DONORS < 30 years old are included! # - Candidates < 18; on HR waitlist (not HL)# - Candidates added to waitlist between 2010 and 2019# - Our offer data is from 2010-march 2021, so there will be some# candidates still on waitlist at march 2021. #----------------------------------------------------------------------------#offers = PTR %>%# add year and if offered donor was eventually utilizedmutate(MATCH_RUN_YEAR =format(MATCH_SUBMIT_DT, "%Y"),donor_utilized =ifelse(is.na(MAX_ACCEPT_SEQ), 0, 1) ) %>%# get population of interestfilter( AGE_PT <18, # Pediatric Candidate ORG_CAND =="HR", # Candidate only on HR waitlist (not HL) WL_DATE >=as.Date("2010-01-01"), # listed after 2010 (so we have offer data) WL_DATE <=as.Date("2019-12-31"), # listed before 2020 ) %>%select( PTR_ID, WL_ID_CODE, LISTING_CTR_CODE, DONOR_ID, AGE_PT, ORG_CAND, STATUS_AT_OFFER, STATUS_AT_LISTING, WL_DATE, MATCH_SUBMIT_DT, MATCH_RUN_YEAR, INITIAL_RESPONSE_DT, OFFER_ACCEPT, PTR_SEQUENCE_NUM, donor_utilized )offers
Get waiting time until first offer
First offer time:
Code
#: calculate time to first offer# Notes:# - Using days between waitlist registration and match run.# Could use time at initial response, which may change by a day or two# but there are also some large outliers. first_offer = offers %>%slice_min(MATCH_SUBMIT_DT, by = WL_ID_CODE, with_ties =FALSE) %>%transmute( LISTING_CTR_CODE, WL_ID_CODE, STATUS_AT_LISTING, AGE_PT, PTR_SEQUENCE_NUM, WL_DATE,YR =format(WL_DATE, "%Y"),wait_days =as.numeric(as.Date(MATCH_SUBMIT_DT) - WL_DATE) )
Waiting time until first offer:
Code
#: summarize first offer waiting time by listing centerwaiting_time =left_join( first_offer %>%group_by(LISTING_CTR_CODE) %>%summarize(n_total =n(),median_wait_days =median(wait_days) ) %>%ungroup(), first_offer %>%group_by(LISTING_CTR_CODE, STATUS_AT_LISTING) %>%summarize(n =n(),median_wait_days =median(wait_days),.groups ="drop_last" ) %>%mutate(p = n/sum(n)) %>%ungroup() %>%complete(LISTING_CTR_CODE, STATUS_AT_LISTING, fill =list(n = 0L, p =0)) %>%mutate(STATUS_AT_LISTING =str_remove(STATUS_AT_LISTING, "Status ")) %>%pivot_wider(names_from =c(STATUS_AT_LISTING), values_from =c(n, p, median_wait_days), ),by ="LISTING_CTR_CODE")
Save results
n_total: total number of candidates at center
n_{1A, 1B, 2}: number of candidates at each status
p_{1A, 1B, 2}: proportion of candidates at each status
median_wait_days: overall median number of days until first offer
median_wait_days_{1A, 1B, 2}: median number of days until first offer (for status)
We have, for each waitlist, the number of times a donor was refused. However, there can be censoring due to the candidate being removed from the waitlist before an offer is accepted.
The approach is to model the number of refusals before acceptance in a survival analysis setting. Specifically, we are using a Kaplan-Meier estimator (by LISTING_CTR_CODE) to estimate the median number of refusals per candidate.
Notes:
We use a pseudo-bayesian approach to help estimatation for centers with few candidates. We take the aggregate known (uncensored) n_refusals and treat as additional data before fitting a KM.
We are not controlling/adjusting for candidate status, size, etc. We could by moving to a cox-ph or e.g., only using status 1A, etc.
Offer and Refusal data for each waitlist (in population)
Note: using offers dataframe from above section.
The n_refusals is the outcome and accept is the event/censoring indicator.
#: number of pseudo observationsk =3#: create pseudo observations; use frequency weighting wprior = refusals_WL %>%filter(accept) %>%count(n_refusals) %>%mutate(w = k*n/sum(n), accept=TRUE)#: replicate for all LISTING_CTR_CODESLC = refusals_WL$LISTING_CTR_CODE %>% uniqueprior_df =map(LC,\(x) prior %>%mutate(LISTING_CTR_CODE = x)) %>%bind_rows()#: Data for modelingdata_KM =bind_rows( refusals_WL %>%mutate(w=1), # set weights = 1 for original data prior_df # add on the pseudo observations )
Fit the KM model
Use fitted KM to estimate median and mean number of refusals before acceptance:
---title: "Center Level Statistics"format: html: toc: true toc-depth: 5 code-fold: true code-summary: "Code" code-tools: true # self-contained: true # embed-resources: true df-print: pagedtheme: lumen ---## Description1. Assess how much the `LISTING_CTR_CODE` impacts time to offer. The hypothesis is that more remote centers (less nearby donors) have longer wait times and consequentially worse waitlist outcomes. Another reason could be the center specifying tight range of when to make a match run (e.g., tight donor/candidate weight ratio). 2. Assess a center's acceptance practice. The hypothesis is that centers that are more selective in donors will have longer wait times until transplants (due to the more frequent refusals) and consequentially worse waitlist outcomes. ## Settings and Functions {.hidden .unnumbered .unlisted}```{r packages}#| include: falselibrary(mgcv)library(gratia)library(tidyverse)``````{r setup}#| include: falselibrary(tidyverse)knitr::opts_chunk$set(echo =TRUE,out.width ="60%", fig.width=8, fig.height=6,attr.output='style="max-height: 200px;"', # vertical scroll boxskimr_include_summary =FALSE)options(knitr.kable.NA ='') # set missing table values to blankoptions(width =90)theme_set(theme_bw())```## Load all required data ```{r load-all}#| include: falsedir_data ="data"PTR_all =read_rds(file.path(dir_data, "PTR.rds"))WL_hist =read_rds(file.path(dir_data, "cand_wl_hist.rds"))thoracic =read_rds(file.path(dir_data, "thoracic.rds"))```## Assess how much the `LISTING_CTR_CODE` impacts time to offer.This section builds a cox-ph model to estimate the center impact on time to offer. The model controls for several candidate variables that could impact time to offer: Age, Weight, Status, Blood Type, and Year. The Listing Center ID is treated as a random effect and the estimated coefficients are used to assess how much longer or shorter their candidates have to wait for an offer. There are two main reasons for center effects: the geographic location of the listing center and how narrowly it determines it wants to be notified for an offer (e.g., the min and max weight range of the donor to be placed on match run).There is one record per candidate related to the first time the candidate was placed on *active* status (not temporarily inactive). Candidate population: - On a HR waitlist (not HL) - Under 18 at time of offer - Added to the waitlist between 2010-01-01 and 2020-12-31 Note: we are using time from waitlist to the *match run time* when donor was offered to candidate. A better value would be the actual offer time, but the initial response data is not very reliable (e.g., there are some delays in entering the initial response time). ### Calculate waitlist time at statusOnly consider candidates on HR waitlists (not HL or LU)1. Get time candidate added to waitlist2. Get time ranges candidate is at each status3. Get time candidate removed from waitlist4. There can be multiple entries for each waitlist corresponding to multiple status changes during the patient's time on waitlist```{r WL_SURV}#| cache: true#| dependson: "load-all"WL_STATUS = WL_hist %>%#: only consider HR waitlist (not HL)filter(ORG =="HR") %>%#: calculate time at statusgroup_by(WL_ID_CODE) %>%arrange(CHG_DT) %>%filter(lag(STATUS) != STATUS |is.na(lag(STATUS)) | CHG_TY =="D") %>%transmute( WL_ID_CODE, STATUS, CHG_TY, # type of status change (A = Added, M = Modified, D = removal)WL_DT = CHG_DT,# date-time of start in new status (or added)# number of days at status (before change or removal)days_at_status =as.numeric(difftime(lead(CHG_DT), CHG_DT, units ="day")),seq =min_rank(CHG_DT), # sequence of status changesfinal = 1L*(lead(CHG_TY) =="D"), # indicator of final status change ) %>%ungroup() %>%#: Remove observation related to waitlist removalsfilter(CHG_TY !="D") %>%arrange(WL_ID_CODE, WL_DT)``````{r}#| echo: falseWL_STATUS```### Join the PTR (offer) data to the waitlist data- Only consider candidates < 18 (at time of offer).- Adds time of *next offer* the candidate received for each waitlist listing- If the `MATCH_RUN_DT` is after the next status change, then the offer doesn't occur in the given waitlist-status time (i.e., censoring)Outcome Variables:- `wait_days` is the number of (fractional) days from waitlist change to next offer- `event` is the binary indicator that an offer occurred before the next waitlist status change. If `event = 1` then an offer was given to candidate before the next status change and `event = 0` if the status change occurred first (i.e., censoring).- `time` is the elapsed time from status change to either offer or another status change. This is the main variable for modeling time to event.```{r WL-PTR}#: Join the PTR (offer) data to the waitlist dataWL_PTR =inner_join(x = WL_STATUS, y = PTR_all %>%filter(AGE_PT <18) %>%# only keep candidates < 18select(WL_ID_CODE, LISTING_CTR_CODE, MATCH_SUBMIT_DT, AGE = AGE_PT, PTR_ID),by =join_by("WL_ID_CODE", closest(WL_DT <= MATCH_SUBMIT_DT)) ) %>%mutate(wait_days =as.numeric(difftime(MATCH_SUBMIT_DT, WL_DT, units ="day")),event =ifelse(days_at_status >= wait_days |is.na(days_at_status), 1L, 0L), time =pmin(wait_days, days_at_status, na.rm=TRUE) )``````{r}#| echo: falseWL_PTR```### Make data to model time until offerWe only keep one record per candidate/waitlist so we don't overrepresent the candidates that had many refusals and offers. While there are few options, we decided to keep the time to event for the candidate's first active waitlist. Some candidates are initially listed as temporarily inactive. Predictor Variables:- `day` is the number of (fractional) days since 2010-01-01. Used to account for temporal trends in time to offer. - `ABO` is blood type- `WEIGHT_KG` is candidate's weight (at time of listing)- `AGE` is the candidate's AGE (at time of offer)- `STATUS` is the candidate's waitlist status (1A, 1B, 2) at time of offer.```{r data-for-modeling}#: get relevant data from thoracicthor = thoracic %>%transmute( WL_ID_CODE, WEIGHT_KG =coalesce(INIT_WGT_KG_CALC, WGT_KG_TCR), ABO =recode(ABO, A1 ="A", A2 ="A", A1B ="AB", A2B ="AB", .missing ="Unknown") )#: make data for modelingdata_cox = WL_PTR %>%#: get first *active* status for each candidate/waitlistfilter(STATUS !="Temporarily Inactive") %>%slice_min(WL_DT, by = WL_ID_CODE, with_ties =FALSE) %>%#: date rangesfilter(as.Date(WL_DT) >=as.Date("2010-01-01"), as.Date(WL_DT) <=as.Date("2020-12-31"), ) %>%#: Add predictor variablesleft_join(thor, by ="WL_ID_CODE") %>%# add thoracic data#: Select relevant data for modeling transmute(# waitlist info WL_ID_CODE, WL_DT,# outcomes time, event,# predictors WEIGHT_KG,LISTING_CTR_CODE =factor(LISTING_CTR_CODE), STATUS =factor(STATUS), AGE,day =as.numeric(difftime(WL_DT, as.Date("2010-01-01"), units ="day"))-1,ABO =factor(ABO) )summary(data_cox)```### Fit Cox-PH ModelPredicting `time` (time until offer) with censoring indicated by `event`. Using smooth terms (i.e., thin-plate regression splines) for `WEIGHT_KG`, `AGE`, and `day`. Using random effect terms for `ABO` and `LISTING_CTR_STATUS`. Including `STATUS` unpenalized. Parameters estimated using REML. ```{r cox-model}#| cache: truelibrary(mgcv)cox_gam = mgcv::gam(time ~ STATUS +s(WEIGHT_KG) +s(AGE) +s(day) +s(ABO, bs ="re") +s(LISTING_CTR_CODE, bs ="re"),weights = event, # censoring indicatorfamily = mgcv::cox.ph(), # Cox PH modelmethod ='REML',data = data_cox)summary(cox_gam)```### Get Listing Center EffectsNote: positive coefficients represent fast time to offer. These are centers that have shorter time to offer than other centers (after controlling for age, weight, blood type, status, and year). The centers' with negative coefficients have longer waiting until offer. ```{r LC-effects-calculate}library(gratia)LC =smooth_estimates(cox_gam, select ="s(LISTING_CTR_CODE)") %>%select( LISTING_CTR_CODE, LC_effect = .estimate ) %>%arrange(LC_effect)```Density estimation of LC effects: ```{r LC-coef-plot}LC %>%ggplot(aes(LC_effect)) +geom_density() +geom_rug()```### Save results- `LC_effect`: the coefficient for listing center in the cox-ph regression. Higher (positive) value means shorter wait time. Smaller (negative) value means longer wait time. - `total`: total number of candidates listed at center- `Status {1A, 1B, 2}`: number of candidates at each status```{r LC_effects}#| echo: TRUE(LC_effects = data_cox %>%count(LISTING_CTR_CODE, STATUS) %>%spread(STATUS, n, fill = 0L) %>%mutate(total =`Status 1A`+`Status 1B`+`Status 2`) %>%left_join( LC, by ="LISTING_CTR_CODE" ) %>%relocate(LC_effect, .after=1))``````{r save-LC-effects}LC_effects %>%write_csv("data/LC_effects.csv")```## Median Wait TimeEstimates the median time from listing to first offer per center. Note that *this ignores status changes*; it only considers time from initial waitlisting to first offer. Many candidates change status before first offer, but this doesn't account for it. This is different than how the Listing Center effects are estimated in the above section which limits analysis to first status period.### Join waitlist data to PTR```{r add-waitlist-data-to-PTR}#| cache: true#| dependson: "load-all"#: Get waitlist status and waitlist organ at time of match runWL_STATUS = PTR_all %>%select(PTR_ID, WL_ID_CODE, MATCH_SUBMIT_DT) %>%left_join( WL_hist, join_by(WL_ID_CODE, closest(MATCH_SUBMIT_DT > CHG_DT )) ) %>%select(PTR_ID, ORG_CAND = ORG, STATUS_AT_OFFER = STATUS)#: get date candidate added to waitlist# This should match thoracic$INIT_DATE and INIT_STAT# there are only a handful of cases where INIT_DATE < WL_DATE (mostly < 2004)cand_listed_date = WL_hist %>%filter(CHG_TY =="A") %>%# date activated on waitlistdistinct() %>%transmute( WL_ID_CODE, WL_DATE =as.Date(CHG_DT), STATUS_AT_LISTING = STATUS )#: add data to PTR PTR = PTR_all %>%# add PT_CODE, ORG_CAND, and STATUS_CAND to PTRleft_join(WL_STATUS, by="PTR_ID") %>%# add data candidate placed on waitlistleft_join(cand_listed_date, by ="WL_ID_CODE") # left_join(thoracic %>% select(WL_ID_CODE, STATUS_AT_LISTING = INIT_STAT, WL_DATE = INIT_DATE), # by = "WL_ID_CODE")#----------------------------------------------------------------------------### Make offer data ----# NOTES: # - ONLY offers from DONORS < 30 years old are included! # - Candidates < 18; on HR waitlist (not HL)# - Candidates added to waitlist between 2010 and 2019# - Our offer data is from 2010-march 2021, so there will be some# candidates still on waitlist at march 2021. #----------------------------------------------------------------------------#offers = PTR %>%# add year and if offered donor was eventually utilizedmutate(MATCH_RUN_YEAR =format(MATCH_SUBMIT_DT, "%Y"),donor_utilized =ifelse(is.na(MAX_ACCEPT_SEQ), 0, 1) ) %>%# get population of interestfilter( AGE_PT <18, # Pediatric Candidate ORG_CAND =="HR", # Candidate only on HR waitlist (not HL) WL_DATE >=as.Date("2010-01-01"), # listed after 2010 (so we have offer data) WL_DATE <=as.Date("2019-12-31"), # listed before 2020 ) %>%select( PTR_ID, WL_ID_CODE, LISTING_CTR_CODE, DONOR_ID, AGE_PT, ORG_CAND, STATUS_AT_OFFER, STATUS_AT_LISTING, WL_DATE, MATCH_SUBMIT_DT, MATCH_RUN_YEAR, INITIAL_RESPONSE_DT, OFFER_ACCEPT, PTR_SEQUENCE_NUM, donor_utilized )offers```### Get waiting time until first offerFirst offer time:```{r first-offer}#: calculate time to first offer# Notes:# - Using days between waitlist registration and match run.# Could use time at initial response, which may change by a day or two# but there are also some large outliers. first_offer = offers %>%slice_min(MATCH_SUBMIT_DT, by = WL_ID_CODE, with_ties =FALSE) %>%transmute( LISTING_CTR_CODE, WL_ID_CODE, STATUS_AT_LISTING, AGE_PT, PTR_SEQUENCE_NUM, WL_DATE,YR =format(WL_DATE, "%Y"),wait_days =as.numeric(as.Date(MATCH_SUBMIT_DT) - WL_DATE) )```Waiting time until first offer:```{r waiting-time}#: summarize first offer waiting time by listing centerwaiting_time =left_join( first_offer %>%group_by(LISTING_CTR_CODE) %>%summarize(n_total =n(),median_wait_days =median(wait_days) ) %>%ungroup(), first_offer %>%group_by(LISTING_CTR_CODE, STATUS_AT_LISTING) %>%summarize(n =n(),median_wait_days =median(wait_days),.groups ="drop_last" ) %>%mutate(p = n/sum(n)) %>%ungroup() %>%complete(LISTING_CTR_CODE, STATUS_AT_LISTING, fill =list(n = 0L, p =0)) %>%mutate(STATUS_AT_LISTING =str_remove(STATUS_AT_LISTING, "Status ")) %>%pivot_wider(names_from =c(STATUS_AT_LISTING), values_from =c(n, p, median_wait_days), ),by ="LISTING_CTR_CODE")```### Save results- `n_total`: total number of candidates at center- `n_{1A, 1B, 2}`: number of candidates at each status- `p_{1A, 1B, 2}`: proportion of candidates at each status- `median_wait_days`: overall median number of days until first offer- `median_wait_days_{1A, 1B, 2}`: median number of days until first offer (for status)```{r save-waiting_time.csv}waiting_time waiting_time %>%write_csv("data/waiting_time.csv")```## Refusal statistics We have, for each waitlist, the number of times a donor was refused. However, there can be censoring due to the candidate being removed from the waitlist before an offer is accepted.The approach is to model the number of refusals before acceptance in a survival analysis setting. Specifically, we are using a Kaplan-Meier estimator (by `LISTING_CTR_CODE`) to estimate the *median number of refusals per candidate*. Notes: - We use a pseudo-bayesian approach to help estimatation for centers with few candidates. We take the aggregate known (uncensored) n_refusals and treat as additional data before fitting a KM. - We are not controlling/adjusting for candidate status, size, etc. We could by moving to a cox-ph or e.g., only using status 1A, etc. ### Offer and Refusal data for each waitlist (in population)Note: using `offers` dataframe from above section.The `n_refusals` is the outcome and `accept` is the event/censoring indicator. ```{r refusals_WL}refusals_WL = offers %>%group_by(WL_ID_CODE, LISTING_CTR_CODE, STATUS_AT_LISTING) %>%summarize(n_offers =n_distinct(DONOR_ID),accept =any(OFFER_ACCEPT =="Y"),n_refusals = n_offers - accept,.groups ="drop" )refusals_WL```### Set-up "prior" data```{r KM-prior}#: number of pseudo observationsk =3#: create pseudo observations; use frequency weighting wprior = refusals_WL %>%filter(accept) %>%count(n_refusals) %>%mutate(w = k*n/sum(n), accept=TRUE)#: replicate for all LISTING_CTR_CODESLC = refusals_WL$LISTING_CTR_CODE %>% uniqueprior_df =map(LC,\(x) prior %>%mutate(LISTING_CTR_CODE = x)) %>%bind_rows()#: Data for modelingdata_KM =bind_rows( refusals_WL %>%mutate(w=1), # set weights = 1 for original data prior_df # add on the pseudo observations )```### Fit the KM model Use fitted KM to estimate median and mean number of refusals before acceptance: ```{r KM-fit}library(survival)KM_fit =survfit(Surv(n_refusals, accept) ~ LISTING_CTR_CODE, data = data_KM, weight = w) refusals_KM =summary(KM_fit)$table %>%as_tibble(rownames ="LISTING_CTR_CODE") %>%transmute(LISTING_CTR_CODE =str_remove(LISTING_CTR_CODE, "LISTING_CTR_CODE="),# records, # n = n.max - k, # events = events - k, refusals_mean = rmean, refusals_median = median )refusals_KM```### Save Data- `n_candidates`: number of candidates listed at center- `median_refusals`: median number of offer refusals per candidate- `median_refusals_old`: the orginal, biased, estimation that didn't account for censoring or small sample sizes- `mean_refusals`: mean number of offer refusals per candidate- `n_{offers, refusals}`: number of offers/refusals at center```{r save-refusal-data}refusals = refusals_WL %>%group_by(LISTING_CTR_CODE) %>%summarize(n_candidates =n_distinct(WL_ID_CODE),median_refusals_old =median(n_refusals),n_offers =sum(n_offers),n_refusals =sum(n_refusals),p_refusals = n_refusals / n_offers,# n_tx = sum(accept),# p_tx = mean(accept) ) %>%ungroup() %>%left_join( refusals_KM %>%rename(median_refusals = refusals_median, mean_refusals = refusals_mean ), by ="LISTING_CTR_CODE" )refusals refusals %>%write_csv("data/refusals.csv")``````{r}#| eval: false#| echo: false### Some other ideas. But dropped these to take censoring (KM) approach## Assume number of refusals is geometric distributed. # dgeom_median = floor(-1/log2(p_refusals))-1, # median from geometric rv## Shrink p_refusals# refusal_rate = 1 - mean(offers$OFFER_ACCEPT == "Y")# k = 5# p_refusals = mix of refusal_rate and p_observed# # Also see: Beta-Geometric Distributions``````{r}#| echo: falseknitr::knit_exit()```## Plots```{r}#----------------------------------------------------------------------------### Plots ----#----------------------------------------------------------------------------#theme_set(theme_bw())#: median time to offer vs. median number of refusalsfull_join(waiting_time, refusals, by ="LISTING_CTR_CODE") %>%filter(n_total >20) %>%ggplot(aes(median_wait_days, median_refusals)) +geom_smooth(color ="black", alpha = .20) +geom_point(aes(size = n_candidates), alpha = .50, pch =1) +scale_x_continuous(breaks =seq(0, 1000, by =7)) +scale_y_continuous(breaks =seq(0, 100, by =2)) +coord_cartesian(ylim =c(0, NA)) +labs(x ="median time to first offer (in days)", y ="median number of refusals per candidate",size ="# candidates" ) #: ECDF of time to first offer%>%%>%ggplot() +stat_ecdf(aes(x = median_wait_days_1A, color ="1A")) +stat_ecdf(aes(x = median_wait_days_1B, color ="1B")) +stat_ecdf(aes(x = median_wait_days_2, color ="2")) +scale_x_continuous(breaks =seq(0, 1000, by =7)) +scale_y_continuous(breaks =seq(0, 1, by = .10)) +coord_cartesian(xlim =c(0, 50)) +labs(x ="time to first offer (in days)", y ="Cumulative Probability",color ="Status" )#: center size vs. refusalsrefusals %>%filter(n_candidates >20) %>%ggplot(aes(n_candidates/10, median_refusals)) +geom_smooth(color ="black", alpha = .20) +geom_point() +scale_x_continuous(breaks =seq(0, 100, by =2)) +scale_y_continuous(breaks =seq(0, 100, by =2)) +labs(x ="avg number of candidates per year",y ="median number of refusals per candidate" )#: center size vs. wait timewaiting_time %>%filter(n_total >20) %>%ggplot(aes(n_total/10, median_wait_days)) +geom_smooth(color ="black", alpha = .20) +geom_point() +scale_x_continuous(breaks =seq(0, 100, by =2)) +scale_y_continuous(breaks =seq(0, 100, by =10)) +labs(x ="avg number of candidates per year",y ="median time until first offer (in days)" )set.seed(321)refusals %>%filter(n_candidates >20) %>%ggplot(aes(median_refusals, p_tx)) +geom_smooth(color ="black", alpha = .20) +geom_jitter(aes(size = n_candidates), alpha = .2, width = .125, height=0) +scale_x_continuous(breaks =seq(0, 1000, by =1)) +scale_y_continuous(breaks =seq(0, 1, by = .10)) +labs(x ="median number of refusals per candidate",y ="proportion of candidates transplanted", )```